POEM: 1-Bit Point-Wise Operations Based on E-M for Point Cloud Processing
163
Algorithm 12 POEM training. L is the loss function (summation of LS and LR) and N
is the number of layers. Binarize() binarizes the filters obtained using the binarization Eq.
6.36, and Update() updates the parameters according to our update scheme.
Input: a minibatch of inputs and their labels, unbinarized weights w, scale factor α,
learning rates η.
Output:
updated
unbinarized
weights
wt+1,
updated
scale
factor
αt+1.
1: {1. Computing gradients with aspect to the parameters:}
2: {1.1. Forward propagation:}
3: for i =1 to N do
4:
bwi ←Binarize(wi) (using Eq. 6.36)
5:
Bi-FC features calculation using Eq. 6.87 – 6.72
6:
Loss calculation using Eq. 6.88 – 6.44
7: end for
8: {1.2. Backward propagation:}
9: for i =N to 1 do
10:
{Note that the gradients are not binary.}
11:
Computing δw using Eq. 6.89 – 6.59
12:
Computing δα using Eq. 6.60 – 6.62
13:
Computing δp using Eq. 6.63 – 6.64
14: end for
15: {Accumulating the parameters gradients:}
16: for i = 1 to N do
17:
wt+1 ←Update(δw, η) (using Eq. 6.89)
18:
αt+1 ←Update(δα, η) (using Eq. 6.61)
19:
pt+1 ←Update(δw, η) (using Eq. 6.64)
20:
ηt+1 ←Update(η) according to learning rate schedule
21: end for
Then, we optimize wj
i as
δwj
i = ∂LS
∂wj
i
+ λ∂LR
∂wj
i
+ τEM(wj
i ),
(6.58)
where τ is the hyperparameter to control the proportion of the Expectation-Maximization
operator EM(wj
i ). EM(wj
i ) is defined as
EM(wj
i ) =
2
k=1 ˆξjk
i (ˆμk
i −wj
i ),
ˆμ1
i < wj
i < ˆμ2
i
0,
else
.
(6.59)
Updating αi: We further update the scale factor αi with wi fixed. δαi is defined as the
gradient of αi, and we have
δαi = ∂LS
∂αi
+ λ∂LR
∂αi
(6.60)
αi ←|αi −ηδαi|,
(6.61)
where η is the learning rate. The gradient derived from softmax loss can be easily calculated
on the basis of backpropagation. Based on Eq. 6.44, we have
∂LR
∂αi
= (wi −αi ◦bwi) · bwi.
(6.62)